Inventi Impact: Modeling & Simulation

Articles

Inventi:ems/45622/22

Person Localization Model Based on a Fusion of Acoustic and Visual Inputs

01-Jul-2022 Research 2022 : July-September

Leon Koren, Tomislav Stipancic, Andrija Ricko, Luka Orsag

PLEA is an interactive, biomimetic robotic head with non‐verbal communication capabilities. PLEA reasoning is based on a multimodal approach combining video and audio inputs to determine the current emotional state of a person. PLEA expresses emotions using facial expressions generated in real‐time, which are projected onto a 3D face surface. In this paper, a more sophisticated computation mechanism is developed and evaluated. The model for audio‐visual person separation can locate a talking person in a crowded place by combining input from the ResNet network with input from a hand‐crafted algorithm. The first input is used to find human faces in the room, and the second input is used to determine the direction of the sound and to focus attention on a single person. After an information fusion procedure is performed, the face of the person speaking is matched with the corresponding sound direction. As a result of this procedure, the robot could start an interaction with the person based on non‐verbal signals. The model was tested and evaluated under laboratory conditions by interaction with users. The results suggest that the methodology can be used efficiently to focus a robot’s attention on a localized person.

How to Cite this Article
Attribution/ CC Compliant Citation: Koren, Leon, et al. "Person Localization Model Based on a Fusion of Acoustic and Visual Inputs." Electronics 11.3 (2022): 440. https://doi.org/10.3390/electronics11030440 http://creativecommons.org/licenses/by/4.0/ Some formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Modeling & Simulation

Articles

Inventi:ems/45622/22

Person Localization Model Based on a Fusion of Acoustic and Visual Inputs

How to Cite this Article

Links

Contact Us